{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e057f306",
   "metadata": {},
   "source": [
    "# MongoDB Demo\n",
    "\n",
    "This guide shows you how to directly use our `DocumentStore` abstraction backed by MongoDB. By putting nodes in the docstore, this allows you to define multiple indices over the same underlying docstore, instead of duplicating data across indices.\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/docstore/MongoDocstoreDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0bcc1a5",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a95d78b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-storage-docstore-mongodb\n",
    "%pip install llama-index-storage-index-store-mongodb\n",
    "%pip install llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ef52064",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a54d1c43-4b7f-4917-939f-a964f6f3dafc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa67fa07-1395-4aab-a356-72bdb302f6b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "import os\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d12d766-3ca8-4012-9da2-248be80bb6ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, StorageContext\n",
    "from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex\n",
    "from llama_index.core import SummaryIndex\n",
    "from llama_index.core import ComposableGraph\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core.response.notebook_utils import display_response\n",
    "from llama_index.core import Settings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e146b01",
   "metadata": {},
   "source": [
    "#### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79d2484e",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f6dd9d5f-a601-4097-894e-fe98a0c35a5b",
   "metadata": {},
   "source": [
    "#### Load Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7cdaf9d-cfbd-4ced-8d4e-6eef8508224d",
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = SimpleDirectoryReader(\"./data/paul_graham/\")\n",
    "documents = reader.load_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bae82b55-5c9f-432a-9e06-1fccb6f9fc7f",
   "metadata": {},
   "source": [
    "#### Parse into Nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f97e558a-c29f-44ec-ab33-1f481da1a6ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "\n",
    "nodes = SentenceSplitter().get_nodes_from_documents(documents)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aff4c8e1-b2ba-4ea6-a8df-978c2788fedc",
   "metadata": {},
   "source": [
    "#### Add to Docstore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1514211c",
   "metadata": {},
   "outputs": [],
   "source": [
    "MONGO_URI = os.environ[\"MONGO_URI\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ba8b0da-67a8-4653-8cdb-09e39583a2d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.storage.docstore.mongodb import MongoDocumentStore\n",
    "from llama_index.storage.index_store.mongodb import MongoIndexStore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60e781d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=MongoDocumentStore.from_uri(uri=MONGO_URI),\n",
    "    index_store=MongoIndexStore.from_uri(uri=MONGO_URI),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0b18789",
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context.docstore.add_documents(nodes)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "528149c1-5bde-4eba-b75a-e8fa1da17d7c",
   "metadata": {},
   "source": [
    "#### Define Multiple Indexes\n",
    "\n",
    "Each index uses the same underlying Node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "316fb6ac-2031-4d17-9999-ffdb827f46d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "summary_index = SummaryIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9440f405-fa75-4788-bc7c-11d021a0a17b",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_index = VectorStoreIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "364ef89f-4ba2-4b1a-b5e5-619e0e8420ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "keyword_table_index = SimpleKeywordTableIndex(\n",
    "    nodes, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c6b2141-fc77-4dec-891b-d4dad0633b35",
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: the docstore still has the same nodes\n",
    "len(storage_context.docstore.docs)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "365a025b",
   "metadata": {},
   "source": [
    "#### Test out saving and loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b359a08",
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: docstore and index_store is persisted in MongoDB by default\n",
    "# NOTE: here only need to persist simple vector store to disk\n",
    "storage_context.persist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84b3d2f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# note down index IDs\n",
    "list_id = summary_index.index_id\n",
    "vector_id = vector_index.index_id\n",
    "keyword_id = keyword_table_index.index_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1593ca1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import load_index_from_storage\n",
    "\n",
    "# re-create storage context\n",
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=MongoDocumentStore.from_uri(uri=MONGO_URI),\n",
    "    index_store=MongoIndexStore.from_uri(uri=MONGO_URI),\n",
    ")\n",
    "\n",
    "# load indices\n",
    "summary_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=list_id\n",
    ")\n",
    "vector_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=vector_id\n",
    ")\n",
    "keyword_table_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=keyword_id\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d3bf6aaf-3375-4212-8323-777969a918f7",
   "metadata": {},
   "source": [
    "#### Test out some Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bba68f3-2743-437e-93b6-ce9ba92e40c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "chatgpt = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "\n",
    "Settings.llm = chatgpt\n",
    "Settings.chunk_size = 1024"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544c0565-72a0-434b-98e5-83138ebdaa2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = summary_index.as_query_engine()\n",
    "list_response = query_engine.query(\"What is a summary of this document?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39d250be",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_response(list_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "036077b7-108e-4026-9628-44c694343460",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = vector_index.as_query_engine()\n",
    "vector_response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42229e09",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_response(vector_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecd7719c-f663-4edb-a239-d2a8f0a5c091",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = keyword_table_index.as_query_engine()\n",
    "keyword_response = query_engine.query(\n",
    "    \"What did the author do after his time at YC?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37524641-2632-4a76-8ae6-00f1285256d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_response(keyword_response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
